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Abstract 

An algorithm for the numerical inversion of large matrices, the 
biconjugate gradient algorithm (BGA), is investigated in view 
of its use for Monte Carlo simulations of fermionic field theo- 
ries. It is compared with the usual conjugate gradient algorithm 
(CGA) and the minimal residue algorithm (MRA) within a Higgs- 



Yukawa model including mirror fermions. For this model it can 
be shown that BGA represents an improvement under certain 
circumstances where the other two slow down. 

1 Introduction 

The most time consuming part in numerical investigations of field theories 
including fermions on the lattice is the inversion of the fermion matrix during 
the Monte Carlo updating. If we call this matrix Q, then, using the Hybrid 
Monte Carlo (HMC) algorithm, for each leapfrog step the equation 

Q + Qx = b (1) 

has to be solved. Usually this is done by the conjugate gradient algorithm 
(CGA) |], H], [3|, which allows the inversion of a positive definite and Her- 
mitean N x N matrix within at most N iteration steps. Actually this is true 
only for an analytical inversion. Due to roundoff errors during a numerical 
inversion one has to set an error bound 5 , which terminates the algorithm 
when this bound is met. Nevertheless at least analytically CGA is forced to 
converge after a maximum number of steps. 

The minimal residue algorithm (MRA) |2|, |], |5| is another well-known 

2 



method for the inversion of large matrices. In some sense it is the opposite of 
CGA. Firstly, MRA is able to treat non-Hermitean and non-positive definite 
matrices, which on the one hand allows its application to more general prob- 
lems, and on the other hand allows a more flexible dealing with the matrix 
that has to be inverted (see below). Secondly, in contrast to CGA, it is not 
forced to converge within a predictable number of iterations. 

Because of the large amount of computer time for such iterative inversions 
a lot of effort has been spent in the acceleration of this part of the Monte Carlo 
algorithms. One important possibility to reach this aim is preconditioning 
[||, ||. Preconditioning means the transformation of the original matrix into 
another one, which can be treated more easily by the algorithm of inversion. 
If this matrix is "closer" to the unit matrix than the original matrix, its 
inversion will be faster. 

The structure of many matrices that result from the discretisation of 
differential equations are convenient for the so called "odd-even" or "red- 
black" preconditioning || [7]]. But this only works, if in equation flU) Q + and 
Q are inverted separately. Because both of these matrices are usually neither 
Hermitean nor positive definite this cannot be done by CGA, but e.g. by 
MRA. In this sense the latter is more flexible. 
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The effect of combining preconditioning with MRA was tested within 
QCD [|7| and a Higgs- Yukawa model, respectively ||. It was found that 
for small fermionic hopping parameters K or large fermion masses the use 
of MRA results in a considerable gain of computer time. For larger K the 
time requirements using MRA increase drastically, due to the fact that its 
convergence properties are worse than those of CGA. In those .fT-regions it 
appears that only the inversion by CGA remains feasible and therefore the 
odd-even preconditioning is no longer possible. 



Here the biconjugate gradient algorithm (BGA) [||, [Kj comes into play. 
It combines the advantages of both algorithms described above. On the 
one hand BGA, which can be considered as a generalisation of CGA, is also 
forced to converge like the latter. On the other hand it allows the inversion of 
non-Hermitean and non-positive definite matrices and therefore the separate 
treatment of Q + and Q like MRA. Investigations of some general properties 
of this algorithm have been done in numerical mathematics. But to the 
best of my knowledge it has so far not been considered for the inversion of 
fermion matrices. This article is devoted to the examination and comparison 
of the three algorithms within the U(1)l®U(1 ^-symmetric and the SU(2) L £g> 



SU(2)F>-symmetric Higgs- Yukawa model [O, O, [T3| . 



Before going into more detail, I would like to mention another possibility 
to accelerate the iterative matrix inversion, the so called "educated guess" , 
because it is used during all our calculations. It consists of guessing a first 
approximate solution X\ of equation (P, or corresponding equations, taking 
into account available foreknown information. In our case we suggest that the 
system under investigation and therefore the fermion matrix, which is a part 
of it, shows a continuous behaviour during each HMC trajectory. Therefore 
using previous solutions of ([!]) within the same trajectory, one can make an 
extrapolation to the solution of the actual equation, using the "Gottlieb- 
trick" |iLl}| . Keeping this in mind, it is plausible that the quality of the guess 



is strongly influenced by the step size e used during the HMC trajectory. If 
e = the solutions of two succeeding equations within a trajectory have to 
be the same. On the other hand, if e is very large, they have nearly nothing 
to do with each other. 

2 The Algorithm 

Let A be a complex NxN matrix, which need not to be Hermitean or positive 
definite, but invertible. To solve the equation Ax = b BGA constructs five 



sequences of complex N-component vectors r k ,r k ,p k ,p k ,x k by the following 



recurrence: 



(r k ,r k ) 
iP,A P ) 

rk+i = r k - a k Ap k 

r k+ i = f k -a* k A + p k 

x k+ i = x k + a k p k 
(f k+1 ,r k+1 ) 



Pk = 



(r k ,r k ) 
Pk+i = r k+1 + I3 k p k 

p k+ i = f k+l + (3* k p k . (2) 

The vectors are set initially to r\ = b — Ax±, where x\ is some guess of 
the solution, and f\ = pi — Pi — r±. The scalar product is defined as 
(a, b) = Eilx a*h. 

For this recurrence the biorthogonality condition, 

{f i ,r j ) = (r i ,r j ) = Vj<i , (3) 

and the biconjugacy condition, 

(Pi, A Pj) = ( A Pi,Pj) = Vj<i , (4) 



can be derived in the same way as it was done for real matrices A in [|T(] . 
The only difference is that now «j and fa are complex numbers and have to 
be treated differently for r i? pj and fj,pj, respectively. 

From the conditions above it follows that the r t form a sequence of lin- 
early independent vectors, unless the algorithm breaks down, if one of the 
denominators in (0) vanishes. Therefore BGA must terminate after m < N 
steps with r m+ i = 0, if rounding errors are neglected for the moment. The 
sequence X{ consequently leads to x m+ i being the solution of Ax = b. 

The above mentioned break down, owing to vanishing denominators, ap- 
parently occurs quite rarely || and has indeed never been observed to occur 
during the tests within the Higgs- Yukawa model. 

Due to roundoff errors during the numerical inversion some error bound 
5 must be set as for CGA. As these errors occur in the inversion of both Q + 
and Q, we have a complicated error propagation. In order to achieve the 
same precision as for the one-step inversion by CGA, we decrease the error 
bound for the separated inversions by a factor of 100. We have checked this 
for both MRA and BGA at several points and found that it is more than 
sufficient to get the desired precision. 
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3 Results 



Before comparing the runtime of the three algorithms, one can make some 
qualitative predictions. In order to do this let us have a closer look at the 
fermion matrices of Yukawa models with mirror fermions. For more details 



concerning the lattice action etc. see JTl], |T^, [L3||. The fermion matrix can be 
written as: 



Q{<f>) 



!)■'' 



Jyx 



G^t 1 

G^ x 

1 G 







G 



~ K Yl 5 V,x 



v 








1 
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1 








\ 



1 



x<i>Z j 



(5) 



where G 1 f,,G x are Yukawa couplings, <p is the Higgs field, K the fermion 
hopping parameter, already mentioned above, and 1 the two-dimensional 
unit matrix. The sum over /i goes over all eight lattice directions. The S M 
are connected to the Pauli matrices by S M = — S M = — io^ for /i = 1, 2, 3 and 



S4 = S4 = 1- For negative indices the definition is S M = — 

We distinguish three different kinds of parameters, which will be consid- 
ered separately. The Yukawa couplings and the fermionic hopping parameter 
K are called direct parameters, because they enter Q explicitly. Parameters 
like the scalar hopping parameter k and the quartic scalar self-coupling A 
affect Q only through the Higgs field configuration and are therefore called 
indirect parameters. Their influence on the performance of the inversion 
cannot be examined as easy as that of the first one. A third kind of pa- 
rameters is called algorithmic. Among them are the lattice size, the order 
of preconditioning P 0, 15], the leapfrog step size e, and the error bound S. 



Overrelaxation, which introduces a further algorithmic parameter u, is not 
possible here, because the relations ([|£D are valid only for the trivial case 
uj = 1. 



3.1 Algorithmic parameters 

The influence of the algorithmic parameters will not be examined in great de- 
tail here. But I would like to make some general remarks to give a qualitative 
picture. 
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1. Going to the next higher order of P in BGA, the amount of time for 
one iteration of the matrix inversion doubles, like it does in the MRA. 
In order to get a gain the number of iterations for one inversion must 
therefore decrease by a factor of more than 2. Altough this is the 
case for MRA in certain regions of the parameter space, it is usually 
not true for BGA. In regions where the BGA is competitive with the 
other algorithms, a factor of 1.7 or less is found. Only in those regions 
where MRA still represents the fastest method of inversion we can get 
an improvement by choosing some higher order of P. Because these 
parameter regions are not interesting for BGA we consider only first 
order preconditioning in all following calculations. 

2. In our calculations we have to fix the step size e at a value, where 
the acceptance rate for the HMC-updating is about 75%. Despite this 
one can vary e in order to see how the different algorithms behave. As 
mentioned above, e influences the educated guess based on the Gottlieb- 
trick, and therefore the number of iterations per inversion. Thus it 
is even possible that for certain large values of e, corresponding to 
acceptance rates far below 75%, it will be better to do the inversion 
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without educated guess, simply setting x% to zero. During all tests it 
turned out that CGA is more sensitive to changes in the quality of 
the educated guess, whereas BGA and MRA appear to be more stable. 
This means that it can depend on the value of e, which of the algorithms 
shows the best performance. If the acceptance rate is fixed to 75% it 
can not in general be foreseen whether the educated guess works better 
for CGA or BGA and MRA, respectively. 

3. Concerning error bound S and lattice size BGA and MRA behave quite 
similarly. The larger the lattice or the smaller the error bound the 
better both algorithms are compared to CGA. For a numerical com- 
parison of CGA and MRA in this respect see j5|. The numerical results 
discussed below refer to a 4 3 • 8 lattice and an error bound of 5 = 10 -8 . 

3.2 Direct parameters 

Now let us consider the direct parameters. From the prescription for the odd- 
even preconditioning it follows that it works the better the smaller K and 
the larger the Yukawa couplings are. For such values of the direct parameters 
MRA will always be the best choice. It will be better than CGA, because 
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preconditioning works well, and better than BGA, because it needs roughly 
half the amount of arithmetic for one iteration compared to the latter. If, on 
the other hand, the Yukawa couplings are small and the fermionic hopping 
parameter is large, an improvement by this kind of preconditioning is not 
possible. Therefore CGA will be faster than BGA and MRA in this case. 
The most favourable parameter region for BGA is thus expected to be at 
intermediate values of K, where MRA begins to break down and CGA slows 
down. Furthermore, when the Yukawa couplings strongly differ from each 
other, i.e. one of them has a large absolute value, while the other Yukawa 
coupling has a small one, then this is unfavourable for both CGA and MRA. 

In order to check the expectations stated above I have made several tests. 
If the scalar field is uniform CGA is the best algorithm for all parameter 
values under investigation. The other extreme consists of a randomly chosen 
scalar field. In this case it can be seen from figure 1 that for suitably chosen 
values of G$ and G x BGA is by far the fastest algorithm in a broad region of 
intermediate K. For disadvantageous values of the Yukawa couplings this K- 
region can shrink to zero and we have a situation like that depicted in figure 
2, where for each value of K CGA is better than BGA. An intermediate 
case is shown in figure 3. Here for small values of K BGA is faster than 
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CGA, but it is exceeded by MRA. It follows a small region with nearly the 
same performance of BGA and CGA, before at larger values of K CGA again 
becomes faster. This figure comes from a calculation within the SU(2)-version 
of the model. Of course the extreme cases, depicted for the U(l)-version in 
figure 1 and 2, are also existent in the SU(2) model. From the general 
discussion above we expect that the relative behaviour of the algorithms 
is the same for the SU(2) model and for the U(l) model. For this reason 
and in order to save computer time the following tests are done within the 
U(l)-version. 

The dependence on the Yukawa couplings can be read off from figure 4. 
There all parameters except G x are fixed. When G x moves away from to 
larger values, only BGA remains fast, whereas CGA slows down considerably 
and for MRA convergence is not observed beyond a certain value of G x -value. 

3.3 Indirect parameters 

Considering the indirect parameters, k is the most interesting one, because 
no large effects from A are expected even if it is varied over a large range. 
If all other parameters are held fixed it depends on n whether the system is 
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in the PM, FM, AFM or FI phase ||16||, i.e. it determines the magnetisation 



|0| = (1/N)\ J2x where the sum extends over the N lattice sites. For BGA 
the amount of time for one inversion is of the same magnitude, no matter 
whether we set randomly (|0| « 0) or choose a uniform configuration 
(|0| = 1). The performance of CGA, however, is strongly influenced by the 
magnetisation. CGA needs, depending on the other parameters, up to 60 
times longer for a random configuration, than for a uniform one. (This is the 
largest factor found during the tests, but probably not the largest possible.) 
Concerning this aspect MRA shows opposite behaviour to CGA. It needs 
more time for inverting a matrix with uniform configuration. 

These results suggest that BGA is the most promising candidate in regions 
of intermediate magnetisation. In fact, this expectation is confirmed by table 
1. The table shows the time requirements for one inversion, needed by the 
different algorithms. Three types of scalar field configurations are considered: 
random, uniform and equilibrated by 500 trajectories for k = 0.0 and k = 
±0.1. This values of « cover the FM, PM and AFM phase [[RJ. 

The relative speed of algorithms in table 1 depends only little on the ac- 
tual scalar field configurations. One can observe, following the development 
of magnetisation and time requirements for the inversion during the equili- 
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bration, that at some value of \<p\, BGA will become faster than CGA for 
suitable k (see also table 3 for another set of parameters). For the particular 
parameters of table 1 BGA represents the best inversion method only for a 
small region of |0|, i.e. k. Moreover, these values of k are of little physical 
interest, because the system is not near a phase transition. 

The picture changes drastically if we choose G$ to be zero, instead of 0.1. 
This choice is physically important since it represents the decoupling scenario 



[§, which requires K = 0.125 in addition. The time requirement for one 
inversion then increases by a large factor for all three algorithms, but for 
suitable K this results in an advantage for BGA. It is the fastest algorithm 
for all physically interesting magnetisations, and is beaten by CGA only 
deeper in the FM phase. To demonstrate this I consider K = 0.125 and 
three values of k. Two of them are placed near the second order phase 
transition between the FM and PM phases (k = —0.04, —0.06), the third one 
at the transition between the PM and AFM phases (k = —0.173). The results 
on the algorithmic speeds are summarized in table 2. The development of 
the time requirements and of the magnetisation during the equilibration are 
shown in table 3. One observes that after some equilibration BGA is the 
best choice for this set of parameters, if the magnetisation is not too large. 
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The MRA did not show convergence or was very slow for all values of k. 
The discussion above shows that three different cases occur: 

1. If the values of G^,G X ,K and k are most favourable for BGA, then 
after equilibration, starting from an uniform configuration, BGA is the 
fastest algorithm and has to be used for the production runs. 

2. If the direct parameters have somewhat less suitable values, there are 
still values of k for which BGA is faster than CGA, but if the magneti- 
sation is too small, it will be beaten by MRA (see table 1). In such 
a situation it might happen that BGA is the best method of inversion 
only in a small, physically uninteresting region or even for no value of 
k at all. 

3. In the third case the direct parameters are not appropriate for odd- 
even preconditioning and CGA will always be the fastest algorithm, no 
matter how large n is. 
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4 Summary and Conclusion 

The biconjugate gradient algorithm (BGA) for the inversion of large matrices, 
which combines advantages of the conjugate gradient (CGA) and minimal 
residue algorithms (MRA), has been investigated. In the framework of Higgs- 
Yukawa models it turns out that there exist regions in parameter space where 
the other two algorithms slow down and BGA represents an improvement. To 
be precise, BGA can be the optimal choice in this model, if the fermionic hop- 
ping parameter assumes intermediate values, such that the fermionic mass 
is near zero, and if both Yukawa couplings differ strongly in magnitude, as 
is the case in the decoupling situation. Furthermore the magnetisation must 
not be too large, because otherwise CGA is the best choice. For small mag- 
netisations it depends on the particular choice of direct parameters, whether 
BGA is beaten by MRA. 

In those parameter regions where BGA is the fastest algorithm, using 
higher order preconditioning for BGA does not result in a further gain. Sim- 
ilar to the case of MRA, going to larger lattices or smaller error bounds makes 
BGA more attractive. 

Two further things should be mentioned here. For some Monte Carlo 
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algorithms, like e.g. the Langevin algorithm, the updates require the inversion 
of Q only, instead of Q + Q. This reduces the amount of time for BGA or MRA 
inversions by a factor of two, which is of course not the case for CGA. 

Secondly, taking the matrix Q or Q + itself, without preconditioning, i.e. 
without odd-even decomposition, as input for BGA or MRA no gain is found 
compared to CGA, in contrast to the suggestion of ||. Maybe this is due to 
the special form of our matrix. 
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Figure captions 

Fig. 1. Comparison of BGA, CGA and MRA for the U(l)-model on a 
4 3 • 8 lattice with random scalar field configuration. The parameters are 
A = 10, k = 0, = 0.1, G x = 1.0. The figure shows the time needed for 
a matrix inversion in seconds versus the fermionic hopping parameter. The 
symbols are crosses (BGA), squares (MRA) and diamonds (CGA). 

Fig. 2. The same as figure 1, but at G x = 0.3 

Fig. 3. Comparison of matrix inversion algorithms for the SU(2)-model at 
A = 10~ 6 , k = 0.0, G^p = 0.3, G x = 0.0 on a 4 3 • 8 lattice with a random scalar 
field configuration. The same symbols as in figure 1 are used. 

Fig. 4. Comparison of matrix inversion algorithms at fixed K and varying 
G x for the U(l)-model on a 4 3 • 8 lattice with a random scalar field configu- 
ration. The parameters are A = 10, k = 0, G$ = 0.0, K = 0.125. The figure 
shows the time needed for a matrix inversion in seconds versus the Yukawa 
coupling G x . The symbols are as in figure 1. 
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Table 1: Time in seconds for one inversion of the fermion matrix at different 
magnetisations (random, equilibrated at k — —0.1,0.0 and +0.1, and uni- 
form). The parameters are Gj, = 0.1, G x = -1.0, A = 1,K = 0.125. The 
numbers for \<f>\ ps are averages over random configurations. 



101 


CGA 


BGA 


MRA 


~ 


0.882 


0.255 


0.123 


0.084 


0.67 


0.42 


0.267 


0.275 


0.61 


0.45 


1.05 


0.83 


0.368 


0.443 


3.11 


1 


0.23 


0.34 


3.67 
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Table 2: Time in seconds for one inversion of the fermion matrix at differ- 
ent magnetisations (random, equilibriated at k — —0.173,-0.06 and —0.04, 
and uniform). The parameters are = 0.0, G x = —1.0, A = 10. Empty 
places mean that the MRA did not show convergence within an acceptable 
amount of iterations. The numbers for \<p\ are averages over random 
configurations. 
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CGA 


BGA 


MRA 
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13.720 


1.650 




0.07 


1.040 


0.630 


1.49 


0.168 


0.800 


0.640 


4.50 


0.259 


0.547 


0.622 




1 


0.282 


0.379 
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Table 3: Time in seconds for one Hybrid Monte Carlo trajectory, averaged 
over the first n = 100 trajectories starting with a uniform configuration, over 
the next n = 300 trajectories, and over the last n = 300 trajectories. The 
parameters are the same as for table 2 and the step size is e = 0.04. < \<f>\ > 
is the magnetisation averaged over n measurements. 
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<|0|> 
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CGA 


BGA 




0.51 
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434 


-0.04 


0.23 


300 


1270 


1470 




0.23 


300 
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-0.06 


0.205 
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1450 




0.175 


300 


1510 


1480 




0.37 
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407 
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0.085 
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2120 


1490 




0.068 


300 


2320 


1560 
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